Skip to content

[spark] Support merge schema in MERGE INTO#7789

Merged
JingsongLi merged 3 commits into
apache:masterfrom
Zouxxyy:dev/merge-update
May 11, 2026
Merged

[spark] Support merge schema in MERGE INTO#7789
JingsongLi merged 3 commits into
apache:masterfrom
Zouxxyy:dev/merge-update

Conversation

@Zouxxyy
Copy link
Copy Markdown
Contributor

@Zouxxyy Zouxxyy commented May 8, 2026

Purpose

Add schema evolution support for MERGE INTO and fix nested-field alignment.

  • With spark.paimon.write.merge-schema=true, UPDATE * / INSERT * evolves target schema with new source columns. Star clauses pull from source by name; explicit clauses fill NULL.
  • A FROM_STAR TreeNodeTag preserves the original star intent, so a fully-listed explicit clause is not mistaken for *.
  • AssignmentAlignmentHelper now reorders nested struct / array / map fields by name.

Scope

  • UPDATE * / INSERT * → evolve
  • Explicit clauses → no evolve
  • Mixed → evolve, star pulls source, explicit fills NULL
  • Nested struct / array new fields

Tests

13 new cases in MergeIntoTableTestBase plus WHEN NOT MATCHED BY SOURCE coverage in MergeIntoNotMatchedBySourceTest.

@Zouxxyy Zouxxyy marked this pull request as draft May 8, 2026 15:04
@Zouxxyy Zouxxyy force-pushed the dev/merge-update branch from 688a4bd to aeb72e2 Compare May 8, 2026 16:54
@Zouxxyy Zouxxyy marked this pull request as ready for review May 9, 2026 00:40
Copy link
Copy Markdown
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@JingsongLi JingsongLi merged commit 347c45e into apache:master May 11, 2026
12 checks passed
JingsongLi pushed a commit that referenced this pull request May 13, 2026
…7843)

Follow-up of #7789. Extend `SchemaMergingUtils.diffSchemaChanges` to
emit precise `AddColumn` for new struct fields nested inside `ARRAY`
element / `MAP` value, instead of a coarse `UpdateColumnType` rewriting
the whole nested type.

Falls back to `UpdateColumnType` when a nested field is removed or a map
key type changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants